# Cross-modal interaction
Phi 4 Multimodal Instruct Onnx
MIT
ONNX version of the Phi-4 multimodal model, quantized to int4 precision with accelerated inference via ONNX Runtime, supporting text, image, and audio inputs.
Multimodal Fusion Other
P
microsoft
159
66
Mobilevlm 3B
Apache-2.0
MobileVLM is a fast and powerful multi-modal vision-language model designed specifically for mobile devices, supporting efficient cross-modal interaction.
Text-to-Image
Transformers

M
mtgv
346
13
Mobilevlm 1.7B
Apache-2.0
MobileVLM is a lightweight multi-modal vision-language model designed specifically for mobile devices, supporting efficient image understanding and text generation tasks.
Text-to-Image
Transformers

M
mtgv
647
15
Featured Recommended AI Models